Correction of Medical Handwriting OCR Based on Semantic Similarity
Identifieur interne : 000E75 ( Main/Exploration ); précédent : 000E74; suivant : 000E76Correction of Medical Handwriting OCR Based on Semantic Similarity
Auteurs : Bartosz Broda [Pologne] ; Maciej Piasecki [Pologne]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2007.
Abstract
Abstract: In the paper a method of the correction of handwriting Optical Character Recognition (OCR) based on the semantic similarity is presented. Different versions of the extraction of semantic similarity measures from a corpus are analysed, with the best results achieved for the combination of the text window context and Rank Weight Function. An algorithm of the word sequence selection with the high internal similarity is proposed. The method was trained and applied to a corpus of real medical documents written in Polish.
Url:
DOI: 10.1007/978-3-540-77226-2_45
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000165
- to stream Istex, to step Curation: 000163
- to stream Istex, to step Checkpoint: 000890
- to stream Main, to step Merge: 000E88
- to stream Main, to step Curation: 000E75
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Correction of Medical Handwriting OCR Based on Semantic Similarity</title>
<author><name sortKey="Broda, Bartosz" sort="Broda, Bartosz" uniqKey="Broda B" first="Bartosz" last="Broda">Bartosz Broda</name>
</author>
<author><name sortKey="Piasecki, Maciej" sort="Piasecki, Maciej" uniqKey="Piasecki M" first="Maciej" last="Piasecki">Maciej Piasecki</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:DB2FFB6B2516E71CF28B2B430EE662F7753696EE</idno>
<date when="2007" year="2007">2007</date>
<idno type="doi">10.1007/978-3-540-77226-2_45</idno>
<idno type="url">https://api.istex.fr/document/DB2FFB6B2516E71CF28B2B430EE662F7753696EE/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000165</idno>
<idno type="wicri:Area/Istex/Curation">000163</idno>
<idno type="wicri:Area/Istex/Checkpoint">000890</idno>
<idno type="wicri:doubleKey">0302-9743:2007:Broda B:correction:of:medical</idno>
<idno type="wicri:Area/Main/Merge">000E88</idno>
<idno type="wicri:Area/Main/Curation">000E75</idno>
<idno type="wicri:Area/Main/Exploration">000E75</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Correction of Medical Handwriting OCR Based on Semantic Similarity</title>
<author><name sortKey="Broda, Bartosz" sort="Broda, Bartosz" uniqKey="Broda B" first="Bartosz" last="Broda">Bartosz Broda</name>
<affiliation wicri:level="1"><country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Informatics, Wrocław University of Technology</wicri:regionArea>
<wicri:noRegion>Wrocław University of Technology</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Pologne</country>
</affiliation>
</author>
<author><name sortKey="Piasecki, Maciej" sort="Piasecki, Maciej" uniqKey="Piasecki M" first="Maciej" last="Piasecki">Maciej Piasecki</name>
<affiliation wicri:level="1"><country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Informatics, Wrocław University of Technology</wicri:regionArea>
<wicri:noRegion>Wrocław University of Technology</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Pologne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2007</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">DB2FFB6B2516E71CF28B2B430EE662F7753696EE</idno>
<idno type="DOI">10.1007/978-3-540-77226-2_45</idno>
<idno type="ChapterID">45</idno>
<idno type="ChapterID">Chap45</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In the paper a method of the correction of handwriting Optical Character Recognition (OCR) based on the semantic similarity is presented. Different versions of the extraction of semantic similarity measures from a corpus are analysed, with the best results achieved for the combination of the text window context and Rank Weight Function. An algorithm of the word sequence selection with the high internal similarity is proposed. The method was trained and applied to a corpus of real medical documents written in Polish.</div>
</front>
</TEI>
<affiliations><list><country><li>Pologne</li>
</country>
</list>
<tree><country name="Pologne"><noRegion><name sortKey="Broda, Bartosz" sort="Broda, Bartosz" uniqKey="Broda B" first="Bartosz" last="Broda">Bartosz Broda</name>
</noRegion>
<name sortKey="Broda, Bartosz" sort="Broda, Bartosz" uniqKey="Broda B" first="Bartosz" last="Broda">Bartosz Broda</name>
<name sortKey="Piasecki, Maciej" sort="Piasecki, Maciej" uniqKey="Piasecki M" first="Maciej" last="Piasecki">Maciej Piasecki</name>
<name sortKey="Piasecki, Maciej" sort="Piasecki, Maciej" uniqKey="Piasecki M" first="Maciej" last="Piasecki">Maciej Piasecki</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000E75 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000E75 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:DB2FFB6B2516E71CF28B2B430EE662F7753696EE |texte= Correction of Medical Handwriting OCR Based on Semantic Similarity }}
This area was generated with Dilib version V0.6.32. |